Consistent document engineering
نویسنده
چکیده
When a group of authors collaboratively edits interrelated documents, consistency problems occur almost immediately. Current document management systems (DMSs) provide useful mechanisms such as document locking and version control, but often lack consistency management facilities. At best, consistency is “defined” via informal guidelines, which do not support automatic consistency checks. In this thesis, we complement traditional DMSs by consistency management. We propose to use formal consistency rules that capture semantic consistency requirements. Rules are formalized in a variant of temporal logic. A static type system supports rule formalization, where types also define (formal) document models. In implementing a tolerant view of consistency, we do not expect that the documents satisfy consistency rules. Instead, our novel semantics precisely pinpoints inconsistent document parts and indicates when, where, and why documents are inconsistent. Speed is a key issue in consistency management. Therefore, we develop efficient techniques for consistency checking while retaining our tolerant semantics. Just pinpointing inconsistencies is, however, insufficient for flexible consistency management. We extend our consistency checking approach towards suggesting repairs, which resolve inconsistencies. The critical issues are to suggest only some of the best (i.e., least costly) repairs and to generate repairs efficiently. Therefore, we develop a new two-step approach. First, we employ directed acyclic graphs (DAGs) to carry repairs. These graphs are called suggestion DAGs (short: S-DAGs). In contrast to the enumeration of all possible repairs, S-DAGs provide a suitable means to generate repairs efficiently and to limit the search space for good repairs. Second, from S-DAGs, we derive one repair collection for all consistency rules. Due to the separation of repair derivation from S-DAG generation, the repository is locked during the computationally cheap S-DAG generation only. We have implemented a prototype of a consistency management tool. Our case study in the field of software engineering shows that our contributions can significantly improve consistency management in document engineering and scale to a practically relevant problem size.
منابع مشابه
A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملIntegration Tools for Supporting Incremental Modifications within Design Processes in Chemical Engineering∗
Keeping the vast amount of inter-document relations consistent, which relate entities of different documents, is important for the quality and efficiency of design processes. Especially, these relations are eminent for master documents, as flowsheets (PFD) in chemical engineering, because management decisions are based on statements derived from PFDs. This paper introduces novel tools for an in...
متن کاملDocument Image Retrieval Based on Keyword Spotting Using Relevance Feedback
Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...
متن کاملDocument Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)
Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...
متن کاملEuropean Association of Software Science and Technology
Development processes in engineering disciplines are inherently complex. Throughout the development process, different kinds of inter-dependent design documents are created which have to be kept consistent with each other. Graph transformations are well suited for modeling the operations provided for maintaining inter-document consistency. In this summary, we describe a novel approach to rule e...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کامل